Optimization Algorithms for Kernel Methods
نویسنده
چکیده
Kernel methods are learning systems which employ embedding of the input data in highdimensional feature spaces. The embedding is implemented via kernel functions which act as a dot product in Reproducing Kernel Hilbert Spaces (RKHS). Learning algorithms which require only the canonical dot product of data represented in finite dimensional linear spaces can be generalized to perform in RKHS. This generalization, called a kernel trick, is implemented by substituting a kernel function for the dot products. This technique is useful whenever the kernel functions represent similarity between data better or more efficiently than the canonical dot product. This thesis concentrates mainly on the Support Vector Machines (SVM) learning of classifiers which is a particular example of kernel methods. Learning of the SVM classifier aims to minimize the empirical error while the complexity of the classification rule is controlled at the same time. The learning is expressed as a specific convex Quadratic Programming (QP) task. A design of QP solvers is a challenging problem for two reasons at least: (i) a large training data are often available for learning which give rise to large QP tasks with many variables and (ii) the solvers should be fast because a lot of instances of SVM classifiers must be learned during the model selection stage. The first part of the thesis concentrates on the QP solvers based on known algorithms for the Minimal Norm Problem (MNP) and the Nearest Point Problem (NPP). The main contributions of the thesis involve: (i) known algorithms for MNP and NPP are described in a common framework, (ii) the algorithms are generalized to solve more complex QP tasks and convergence proofs in a finite number of iterations are given and (iii) a novel and faster QP solver is proposed. The proposed QP solvers are simple to implement and they were successfully tested on problems with hundred thousands variables. A novel greedy algorithm for approximation of the training data embedded in the RKHS is proposed in the second part of the thesis. The method is called Greedy Kernel Principal Component Analysis (Greedy KPCA). Similarly to the ordinary Kernel Principal Component Analysis (KPCA), the aim is to minimize the squared reconstruction error of the approximated data. In contrast to the ordinary KPCA, the Greedy KPCA also aims to find a simple model in which the data are described. The proposed method is suitable for reduction of computational and memory requirements of the kernel methods. It can be also applied for reduction of complexity of functions learned by the kernel methods. It was experimentally verified that the method can reduce computational complexity of the Kernel Least Squares Regression and it can speed up evaluation of the SVM classifier. A basic SVM learning is formulated for the binary classifiers. There exists a multiclass formulation which, however, leads to a considerable more complex QP task compared to the binary case. A novel method which allows to transform the learning of the multiclass SVM to the singleclass SVM classifier is proposed in the third part of the thesis. The transformation is based on a simplification of the original problem and employing the Kesler’s construction. The entire transformation is performed solely by a specially designed kernel function. As a result, any solver for a singleclass SVM problem can be readily used to solve the multiclass problem. The proposed method was successfully applied to learning of the Optical Character Recognition systems for a commercial application. The proposed methods were incorporated to the Statistical Pattern Recognition (STPR) toolbox http://cmp.felk.cvut.cz/~xfrancv/stprtool written in Matlab. A substantial part of the toolbox was designed and implemented by the author of the thesis. The toolbox contains an ensemble of pattern recognition techniques, e.g., methods for learning the linear discriminant functions, feature extraction, density estimation and clustering, Support Vector Machines, various kernel methods, etc. Resumé Jádrové metody jsou uč́ıćı se systémy reprezentuj́ıćı vstupńı data v př́ıznakovém prostoru vysoké dimenze pomoćı jádrových funkćı. Př́ıznakový prostor je v tomto př́ıpadě Hilbert̊uv prostor s reprodukčńımi jádry (RKHS). Uč́ıćı se algoritmy pracuj́ıćı pouze se skalárńımi součiny dat representovaných v lineárńım prostoru s konečnou dimenźı lze zobecnit tak, aby pracovaly v RKHS. Toto zobecněńı, známé jako “jádrový trik”, se provede nahrazeńım skalárńıch součin̊u zvolenou jádrovou funkćı. Tato technika je vhodná v př́ıpadech, kdy jádrové funkce vyjadřuj́ı podobnost mezi daty lépe nebo efektivněji než kanonický skalárńı součin. Tato disertačńı práce je zaměřena na metodu učeńı klasifikátor̊u zvanou Support Vector Machines (SVM), jenž je typickým př́ıkladem jádrových metod. Při učeńı SVM klasifikátoru je ćılem minimalizovat empirickou chybu a současně udržovat ńızkou složitost klasifikátoru. Problém učeńı je vyjádřen jako konvexńı optimalizačńı problém kvadratického programováńı (QP). Návrh optimalizátor̊u pro učeńı SVM je těžký problém hlavně ze dvou d̊uvodu: (i) typicky je ťreba učit z rozsáhlých trénovaćıch dat, což vede na QP s mnoha proměnnými a (ii) optimalizace muśı být rychlá, protože je ťreba učit mnoho SVM klasifikátor̊u během fáze výběru modelu. Prvńı část disertačńı práce je věnovaná návrhu QP optimalizátor̊u, které jsou založeny na známých algoritmech pro řešeńı tzv. Minimal Norm Problem (MNP) a Nearest Point Problem (NPP) z výpočetńı geometrie. Hlavńımi př́ınosy disertačńı práce jsou: (i) známé algoritmy řeš́ıćı MNP a NPP jsou popsány ve společném rámci, (ii) tyto algoritmy jsou zobecněny pro řešeńı složitěǰśıho QP problému a je dokázána jejich konvergence v konečném počtu krok̊u a (iii) je navržen nový rychleǰśı algoritmus. Navržené algoritmy jsou jednoduché a byly úspěšně testovaný na velkých problémech se sto tiśıci proměnnými. Ve druhé části disertačńı práce je navržen hladový algoritmus pro aproximaci trénovaćıch dat v RKHS. Metoda je nazvána Greedy Kernel Principal Component Analysis (Greedy KPCA). Stejně jako u klasická Kernel Principal Component Analysis (KPCA) je i zde ćılem minimalizovat rekonstrukčńı chybu aproximovaných dat. Na rozd́ıl od KPCA, se Greedy KPCA snaž́ı nav́ıc popsat data jednoduchým modelem. Navržená metoda je vhodná pro sńıžeńı paměťové a výpočetńı náročnosti jádrových metod a současně ke zjednodušeńı analytického popisu funkćı, které se jádrovými metodami uč́ı. Experimentálně bylo ukázáno, že lze navrženou metodu použ́ıt ke sńıžeńı výpočetńı náročnosti regresńı metody Kernel Least Squares a k urychleńı SVM klasifikátoru. Základńı verze SVM učeńı je navržena pouze pro př́ıpad binárńıch klasifikátor̊u. Formulace SVM učeńı klasifikátor̊u do v́ıce ťŕıd existuje, ale vede na problém QP podstatně těžš́ı v porovnáńı k binárńımu př́ıpadu. Ve ťret́ı části disertačńı práce je navržena transformace, která umožňuje převést problém SVM učeńı v́ıceťŕıdńıho klasifikátoru na problém učeńı klasifikátoru do jedné ťŕıdy. Transformace je založena na zjednodušńı p̊uvodńıho problému a použit́ı Keslerovy konstrukce. Celá transformace je provedena pouze použit́ım speciálně navržené jádrové funkce. To znamená, že jakákoliv metoda pro učeńı jednoťŕıdńıho SVM klasifikátoru může být okamžitě použita pro učeńı klasifikátoru do v́ıce ťŕıd. Navržená metoda byla úspěšně použita pro návrh komerčńıho systému pro rozpoznáváńı znak̊u. Všechny navržené metody byly začleněny do Statistical Pattern Recognition Toolboxu http://cmp.felk.cvut.cz/~xfrancv/stprtool pro Matlab. Podstatná část toolboxu byla navržena a implementována autorem této disertačńı práce. Toolbox obsahuje soubor nástroj̊u pro rozpoznáváńı jako např́ıklad metody pro učeńı lineárńıch diskriminačńıch funkćı, metody pro odhad hustot pravděpodobnosti a shlukováńı, Support Vector Machines, jádrové metody a jiné. Acknowledgement I am indebted to my supervisor Professor Václav Hlaváč for leading me and giving me an opportunity to study at the Center For Machine Perception. He showed me how to conduct research and help me with writing and researching of this thesis. I wish also to thank Professor Michail Ivanovič Schlesinger whose work was a source of many ideas and largely formed my view of pattern recognition. My special thanks go to Professor Mirko Navara for careful reading and commenting on the thesis. His suggestions and comments had a substantial impact on the final form of the thesis. Last but not least I would like to thank all colleagues from CMP who made me possible to work in an inspiring and a nice atmosphere. The research presented in this thesis was supported by the Ministry of Education, Youth and Sports of the Czech Republic under the grant No. MSM6840770013 and the European Commission under project IST-004176 COSPAL and by the Czech Science Foundation under project GACR 102/03/0440 and by the Austrian Ministry of Education under project CONEX GZ 45.535 and by the EU INTAS project PRINCESS 04-77-7347.
منابع مشابه
A path following interior-point algorithm for semidefinite optimization problem based on new kernel function
In this paper, we deal to obtain some new complexity results for solving semidefinite optimization (SDO) problem by interior-point methods (IPMs). We define a new proximity function for the SDO by a new kernel function. Furthermore we formulate an algorithm for a primal dual interior-point method (IPM) for the SDO by using the proximity function and give its complexity analysis, and then we sho...
متن کاملAn Interior Point Algorithm for Solving Convex Quadratic Semidefinite Optimization Problems Using a New Kernel Function
In this paper, we consider convex quadratic semidefinite optimization problems and provide a primal-dual Interior Point Method (IPM) based on a new kernel function with a trigonometric barrier term. Iteration complexity of the algorithm is analyzed using some easy to check and mild conditions. Although our proposed kernel function is neither a Self-Regular (SR) fun...
متن کاملPrimal-Dual Interior-Point Algorithms for Semidefinite Optimization Based on a Simple Kernel Function
Interior-point methods (IPMs) for semidefinite optimization (SDO) have been studied intensively, due to their polynomial complexity and practical efficiency. Recently, J.Peng et al. [14, 15] introduced so-called self-regular kernel (and barrier) functions and designed primal-dual interior-point algorithms based on self-regular proximity for linear optimization (LO) problems. They have also exte...
متن کاملEmpirical Optimal Kernel for Convex Multiple Kernel Learning
Multiple kernel learning (MKL) aims at learning a combination of different kernels, instead of using a single fixed kernel, in order to better match the underlying problem. In this paper, we propose the Empirical Optimal Kernel for convex combination MKL. The Empirical Optimal Kernel is based on the theory of kernel polarization, and is the one with the best generalization ability which can be ...
متن کاملImpact of linear dimensionality reduction methods on the performance of anomaly detection algorithms in hyperspectral images
Anomaly Detection (AD) has recently become an important application of hyperspectral images analysis. The goal of these algorithms is to find the objects in the image scene which are anomalous in comparison to their surrounding background. One way to improve the performance and runtime of these algorithms is to use Dimensionality Reduction (DR) techniques. This paper evaluates the effect of thr...
متن کاملISAR Image Improvement Using STFT Kernel Width Optimization Based On Minimum Entropy Criterion
Nowadays, Radar systems have many applications and radar imaging is one of the most important of these applications. Inverse Synthetic Aperture Radar (ISAR) is used to form an image from moving targets. Conventional methods use Fourier transform to retrieve Doppler information. However, because of maneuvering of the target, the Doppler spectrum becomes time-varying and the image is blurred. Joi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005